Preface
This document is a computational notebook supplement to TBD (Zloteanu & Vuorre, TBD). We introduce Signal Detection Theory (SDT) and show how to apply it in practice in the area of deception research. Moreover, we implement SDT models as Generalized Linear Mixed Models (GLMM) in a bayesian inferential framework in R. This document is a Quarto computational notebook. You can download the source code at TBD and reproduce the analyses in RStudio.
R environment
First, ensure that your current working directory contains both the analysis script and the data file. It is best to download the whole project directory from TBD. If you use RStudio, open up the relevant R Project (deception-sdt.Rproj) in RStudio.
Then, make sure that you have the required R packages installed. We use renv to ensure that you can install and use the exact same versions of the packages. To install those packages, execute renv::restore() in the R console (you only need to do this once).
You can now load the required packages (click “Code” to show code here, and in code blocks in the rest of the document).
We then set some options for the document output.
Code
theme_set(
theme_linedraw() +
theme(panel.grid = element_blank())
)
bayesplot::color_scheme_set(scheme = "brewer-Spectral")
options(digits = 3)Next we set options for the bayesian model estimation procedures. We use as many cores as are available on the machine.
Code
options(
mc.cores = parallel::detectCores(logical = FALSE)
)
dir.create("models", FALSE)Preliminaries
Data
For this tutorial, we use a synthetic copy of data discussed in XYZ. Note that all categorical predictors are R factors, and the outcome is an integer or float (0s and 1s).
| Participant | Training | Stimulus | LieType | Veracity | Answer |
|---|---|---|---|---|---|
| 029 | Emotion | 1-1 | Experiential | True | 1 |
| 029 | Emotion | 2-2 | Experiential | False | 0 |
| 029 | Emotion | 3-2 | Experiential | False | 0 |
| 029 | Emotion | 4-1 | Experiential | True | 1 |
| 029 | Emotion | 6-1 | Experiential | False | 1 |
| 029 | Emotion | 7-1 | Experiential | False | 0 |
-
Trainingis a between-subjects variable indicating which training group the participant was in, -
LieTypeis a within-subjects variable; what type of lie was presented on the trial -
Stimulusindicates which stimulus was presented -
Veracityindicates the stimulus veracity (True or False) -
Answeris the DV, as each answer participants give; truth=1, lie=0
Non-SDT analysis
We first analyse these data with correct / incorrect classifications. Typically, this amounts to aggregating each person’s data to proportions correct (in different conditions / groups), and then visualizing and modelling those. For this example, we ignore any groups and conditions.
First, we create a new variable encoding response accuracy.
Code
d <- d %>%
mutate(
Accuracy = as.integer(as.integer(Veracity) - 1 == Answer)
)
head(d)| Participant | Training | Stimulus | LieType | Veracity | Answer | Accuracy |
|---|---|---|---|---|---|---|
| 029 | Emotion | 1-1 | Experiential | True | 1 | 1 |
| 029 | Emotion | 2-2 | Experiential | False | 0 | 1 |
| 029 | Emotion | 3-2 | Experiential | False | 0 | 1 |
| 029 | Emotion | 4-1 | Experiential | True | 1 | 1 |
| 029 | Emotion | 6-1 | Experiential | False | 1 | 0 |
| 029 | Emotion | 7-1 | Experiential | False | 0 | 1 |
We then summarize the data to proportion correct for each person.
We can then display these proportions correct as a simple histogram.
Code
d_accuracy_participant %>%
ggplot(aes(Accuracy)) +
scale_x_continuous(
"Percent correct",
limits = c(0, 1),
labels = percent
) +
scale_y_continuous(
"Count",
expand = expansion(c(0, .05))
) +
geom_histogram(
color = "black",
fill = "grey80"
)On average, a t-test shows that participants are at chance:
Code
t.test(d_accuracy_participant$Accuracy, mu = 0.5) %>%
parameters()| Parameter | Mean | mu | Difference | CI | CI_low | CI_high | t | df_error | p | Method | Alternative |
|---|---|---|---|---|---|---|---|---|---|---|---|
| d_accuracy_participant$Accuracy | 0.506 | 0.5 | 0.006 | 0.95 | 0.487 | 0.524 | 0.607 | 105 | 0.545 | One Sample t-test | two.sided |
However, this simple measure of accuracy can conflate any potential bias that participants might have (to e.g. prefer “true” over “false” responses) with their ability to tell True stimuli apart from False stimuli. Signal Detection Theory allows studying bias and ability to detect true/falsehoods separetely.
Signal Detection Theory
Introduction
At the heart of Signal Detection Theory is the idea that following an observation period during which a stimulus may or may not have been presented, an observer evaluates her internal degree of evidence for whether a stimulus was presented. Because of noise in the perceptual-cognitive apparatus, the latent evidence signals are not always simply absent (for observation periods, or generally trials, with no stimulus) or present (for trials with a stimulus), but instead assumed to fall on a continuum. She then has or develops some kind of inner criterion which this latent evidence must exceed for her to report “Yes, signal was present during the observation period”. On each trial, a random value from the latent evidence continuum is realized, and the participant’s decision task is simple: If this particular evidence value is greater than the criterion, the participant responds “Yes.”
SDT models are applied (sometimes routinely) in a wide range of research areas. They are very popular in, for example, perception and memory research. Depending on the task at hand and concept under study, the mysterious quantity “latent evidence” is often labelled differently. For instance, in memory research it is often called “memory strength” or “familiarity”.
In the current application, we are interested in judgments of veracity, given some stimulus that is either truthful or not. In other words, participants are presented on each trial with a stimulus that is either False or True, and their task is to decide whether they thought the stimulus was false or true by responding with either “false” or “true”. We might therefore, in this application, call this hypothesized latent quantity as, perhaps whimsically, truthiness.
There are then hypothesized to be two truthiness distributions: One for False stimuli, and one for True stimuli. SDT allows us to examine the degree to which False and True stimuli lead to different truthiness distributions.
If participants are completely unable to detect any difference between False and True stimuli, these two distributions should be identical. Alternatively, if participants somehow are sensitive to the veracity of the stimulus, False and True stimuli should lead to an internal truthiness distribution for True trials that has a greater average value than does the internal truthiness distribution following False trials.
In addition, to make a decision the participant needs some criterion to which each truthiness signal is compared to. SDT allows us to examine where participants set this criterion. For example, it may be of interest whether participants show a liberal criterion, and are likely to respond “true” to just about anything.
One of the great benefits of SDT is that it allows us to separate sensitivity and bias in examining task performance.
We visualize the basic SDT framework for deception tasks in Figure 2. This figure shows two internal truthiness distributions with some made-up parameter values. In red is the distribution for False trials: Sometimes the realized truthiness value on that trial is low, sometimes high, but importantly there is some variation. In blue is the truthiness distribution for True trials: Sometimes the sampled value is high, sometimes low, but in this example it has a greater mean than the distribution for False trials.
Code
sdt_draw <- function(d, crit) {
tibble(
d = c(0, d),
Veracity = factor(d, labels = c("False", "True"))
) %>%
ggplot() +
scale_y_continuous(
"Density",
expand = expansion(c(0, 0.005))
) +
scale_x_continuous(
"Truthiness"
) +
scale_color_brewer(
"Veracity",
palette = "Set1",
aesthetics = c("color")
) +
geom_vline(
xintercept = crit,
linewidth = 0.5
) +
stat_slab(
aes(
xdist = dist_normal(d),
color = Veracity,
fill = after_scale(alpha(color, .25))
),
p_limits = c(0.0001, 0.9999)
) +
# Criterion
annotate(
geom = "curve",
ncp = 9, curvature = -0.3,
x = 3, xend = crit, y = 0.5, yend = 0.2,
arrow = arrow(length = unit(0.5, "lines"), type = "closed", angle = 20)
) +
annotate(
geom = "text",
x = 3, y = 0.53,
label = "Criterion"
) +
# Sensitivity
annotate(
geom = "segment",
x = d, xend = 0.0, y = 0.901, yend = 0.901,
arrow = arrow(ends = "both", length = unit(0.5, "lines"), type = "open", angle = 90)
) +
annotate(
geom = "curve",
ncp = 9, curvature = 0.4, angle = 135,
x = 3.3, xend = mean(c(0, d)), y = 0.901, yend = 0.901,
arrow = arrow(length = unit(0.5, "lines"), type = "closed", angle = 20)
) +
annotate(
geom = "text",
x = 3.3, y = 0.87,
label = "Sensitivity"
) +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank()
)
}
sdt_draw(1, 0.3)Although SDT-based analyses might then seem theory-laden, in fact this model is statistically widely used for binary responses with some binary predictor value.
Practice
In practice, the parameters of this basic SDT model can be calculated from two proportions. To calculate those proportions, we first classify each trial into four different categories: Participants might correctly respond “true” to True (“hit”) and “false” to False (“correct rejection”) stimuli, and incorrectly respond “true” to False (“false alarm”), and “false” to True (“miss”) stimuli. We now classify each trial in the example data to one of these four categories.
Code
| Participant | Training | Stimulus | LieType | Veracity | Answer | Accuracy | Type |
|---|---|---|---|---|---|---|---|
| 029 | Emotion | 1-1 | Experiential | True | 1 | 1 | hit |
| 029 | Emotion | 2-2 | Experiential | False | 0 | 1 | cr |
| 029 | Emotion | 3-2 | Experiential | False | 0 | 1 | cr |
| 029 | Emotion | 4-1 | Experiential | True | 1 | 1 | hit |
| 029 | Emotion | 6-1 | Experiential | False | 1 | 0 | fa |
| 029 | Emotion | 7-1 | Experiential | False | 0 | 1 | cr |
We leave it as an excercise to the reader to find each classification above in Figure 2.
To calculate the SDT parameters, we must then aggregate the data to per-participant counts of these four categories. For computations, we also pivot the data to a “wide” format.
Code
# Aggregate per participant
sdt <- sdt %>%
count(Participant, Type) %>%
pivot_wider(names_from = Type, values_from = n)
head(sdt)| Participant | cr | fa | hit | miss |
|---|---|---|---|---|
| 001 | 7 | 13 | 10 | 10 |
| 002 | 6 | 14 | 12 | 8 |
| 003 | 10 | 10 | 9 | 11 |
| 004 | 8 | 12 | 10 | 10 |
| 005 | 4 | 16 | 13 | 7 |
| 006 | 11 | 9 | 13 | 7 |
The hit rate is then the proportion of “hits” over the sum of “hits” and “misses”. The false alarm rate is the proportion of “false alarms” over the sum of false alarms and correct rejections. Those rates are then “z-scored” to establish the latent continuum as a standard normal distribution. Sensitivity, often called d-prime, is then the distance between the distributions’ means. The response criterion is calculated as (negative) half of the sum of the z-scores. In practice, we calculate all these in R as follows:
Code
| Participant | cr | fa | hit | miss | hr | far | zhr | zfa | dprime | crit |
|---|---|---|---|---|---|---|---|---|---|---|
| 001 | 7 | 13 | 10 | 10 | 0.50 | 0.65 | 0.000 | 0.385 | -0.385 | -0.193 |
| 002 | 6 | 14 | 12 | 8 | 0.60 | 0.70 | 0.253 | 0.524 | -0.271 | -0.389 |
| 003 | 10 | 10 | 9 | 11 | 0.45 | 0.50 | -0.126 | 0.000 | -0.126 | 0.063 |
| 004 | 8 | 12 | 10 | 10 | 0.50 | 0.60 | 0.000 | 0.253 | -0.253 | -0.127 |
| 005 | 4 | 16 | 13 | 7 | 0.65 | 0.80 | 0.385 | 0.842 | -0.456 | -0.613 |
| 006 | 11 | 9 | 13 | 7 | 0.65 | 0.45 | 0.385 | -0.126 | 0.511 | -0.130 |
We can then compute basic statistical tests on these person-specific parameters. For example, we can ask whether sensitivity (dprime) was greater than zero.
Code
t.test(sdt$dprime) %>%
parameters()| Parameter | Mean | mu | Difference | CI | CI_low | CI_high | t | df_error | p | Method | Alternative |
|---|---|---|---|---|---|---|---|---|---|---|---|
| sdt$dprime | 0.031 | 0 | 0.031 | 0.95 | -0.074 | 0.137 | 0.589 | 105 | 0.557 | One Sample t-test | two.sided |
And whether participants show a liberal bias (negative criterion) or a conservative one (positive criterion).
Code
t.test(sdt$crit) %>%
parameters()| Parameter | Mean | mu | Difference | CI | CI_low | CI_high | t | df_error | p | Method | Alternative |
|---|---|---|---|---|---|---|---|---|---|---|---|
| sdt$crit | -0.232 | 0 | -0.232 | 0.95 | -0.284 | -0.179 | -8.75 | 105 | 0 | One Sample t-test | two.sided |
In these example data, sensitivity was not statistically significantly different from zero: On average, participants are unable to tell False from True stimuli, their sensitivity to truthiness was not significantly different from zero. On the other hand, the criterion was very low, indeed negative, so participants were generally liberal to answer “true”. We can use the aggregate parameters to redraw Figure 2 from above (Figure 3):
We can also visualize the participants’ parameters. Note that there were very few trials per participant, thus many participants will have identical parameter estimates. We therefore reduce overplotting by jittering the points a little bit.
Code
sdt %>%
ggplot(aes(crit, dprime)) +
geom_hline(yintercept = 0, lty = 2, linewidth = .25) +
geom_vline(xintercept = 0, lty = 2, linewidth = .25) +
labs(x = "Criterion", y = "Sensitivity") +
geom_smooth(
method = "lm",
linewidth = .33,
color = "black",
alpha = .2
) +
geom_point(
shape = 1,
position = position_jitter(.01, .01)
) +
stat_centroid(
.fun = mean, col = "red",
geom = "point", size = 2
) +
theme(aspect.ratio = 1)Relation with accuracy
Typically, although conceptually different and on different scales, proportion corrects are typically very similar to d-primes.
Code
pa <- left_join(
sdt,
d_accuracy_participant
) %>%
ggplot(aes(Accuracy, dprime)) +
scale_x_continuous(
"Percent correct",
breaks = extended_breaks(5),
labels = percent
) +
scale_y_continuous(
"Sensitivity",
breaks = extended_breaks(7)
) +
geom_hline(yintercept = 0, lty = 2, linewidth = .25) +
geom_vline(xintercept = 0.5, lty = 2, linewidth = .25) +
geom_smooth(
method = "lm",
linewidth = .33,
color = "black",
alpha = .2
) +
geom_point(shape = 1, position = position_jitter(.005, .01)) +
theme(aspect.ratio = 1)
pb <- last_plot() +
aes(y = crit) +
scale_y_continuous(
"Criterion",
breaks = extended_breaks(7)
)
pa | pbNow that we have gleaned some insight into the inner workings of the SDT framework, we turn to estimating SDT models as generalized linear mixed models (GLMMs).
Model 0
Above, we used traditional methods and formulas for calculating SDT parameters for each participant. We then used those person-specific estimates in a further statistical test. We can get equivalent average and participant-specific parameters more efficiently with a GLMM. We label this model with a zero to indicate that it is a “baseline” model: We do not yet include any parameters to estimate the effects of various experimental manipulations.
Coding of predictor variables
We first need to ensure the categorical predictors are used appropriately in the model. We “contrast-code” Veracity. This ensures the model intercept is “between” False and True trials, or their average. It then corresponds to 1 - criterion.
Model specification
We then define the GLMM as a mixed effects formula in R’s extended formula syntax. Below, we regress Answer (0 = “False”, 1 = “True”) on an intercept (in R syntax 1 means an intercept because it adds a column of 1s in the design matrix) and the slope of Veracity (whether the stimulus was True or False). We also specify both as correlated random effects across Participants, and the intercepts in addition as random across Stimulus
Code
f0 <- Answer ~ 1 + Veracity +
(1 + Veracity | Participant) +
(1 | Stimulus)With this syntax, the model is parameterized such that there will be two main parameters. The intercept describes the (negative) criterion: It is half the sum of the z-scores of the hit and false alarm rates, or the z-score of the “true” responses for the “average” stimulus. The slope of Veracity indicates sensitivity–the separation of the latent variables between False and True trials. This parameter is typically called d-prime in many SDT applications.
Prior distributions
Although it is optional, for illustrative purposes we define some vaguely informative prior distributions on the population-level parameters.
Sampling
With the model formula, data, and prior distribution, we can then sample from the model’s posterior distribution. In addition, it is critical here to use the Bernoulli outcome distribution with a probit link; this specifies that we assume normality of the latent variable. We also specify the “file” argument, because some models take a while to sample, with “file” we can load the model from a file instead in subsequent runs.
It is good practice to then check model convergence. For simple models there almost never are any issues. Nevertheless you should always at least use numerical model diagnostics to evaluate if the sampling has performed as intended. The most commonly used metric is called “Rhat”, and this number should be almost exactly 1.0 (within about 0.01 of 1.0). We will return to this below, when looking at the model’s numerical summary.
Another way to do this is to visually see whether the four chains of samples are well mixed:
Similarly, we want to see evidence that the model is able to reproduce the observed data:
Code
pp_check(m0, type = "bars_grouped", group = "Veracity", ndraws = 100) +
scale_x_continuous(breaks = c(0, 1), labels = c("False", "True"))Model summary
The default model summary is printed with summary(). Here we use an additional arugment to show the prior distributions used.
Code
summary(m0, prior = TRUE) Family: bernoulli
Links: mu = probit
Formula: Answer ~ 1 + Veracity + (1 + Veracity | Participant) + (1 | Stimulus)
Data: d (Number of observations: 4240)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Priors:
b ~ normal(0, 1)
Intercept ~ student_t(3, 0, 2.5)
L ~ lkj_corr_cholesky(1)
<lower=0> sd ~ student_t(3, 0, 1)
Group-Level Effects:
~Participant (Number of levels: 106)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sd(Intercept) 0.18 0.03 0.12 0.24 1.00 1623
sd(Veracity1) 0.37 0.06 0.24 0.48 1.00 1721
cor(Intercept,Veracity1) -0.14 0.23 -0.59 0.32 1.00 772
Tail_ESS
sd(Intercept) 1796
sd(Veracity1) 1526
cor(Intercept,Veracity1) 1234
~Stimulus (Number of levels: 21)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.67 0.12 0.48 0.95 1.01 831 1515
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.23 0.16 -0.08 0.55 1.03 235 507
Veracity1 -0.18 0.06 -0.30 -0.05 1.00 2407 2715
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
We first return to the “Rhat” number from above. Above, each population level effect has an Rhat value of ~1.0 and we are reassured of model convergence. If this was not the case, we would first try drawing more samples from the model (see ?brm).
After confirming that the model estimation algorithm has converged, and that the model doesn’t have glaring errors in reproducing the data, we can proceed to interpret the estimates. Instead of the above verbose output, we use parameters() to display a neat summary. The regression coefficients describe the criterion (-Intercept) and d-prime (slope):
Code
parameters(m0, centrality = "mean")| Parameter | Mean | CI | CI_low | CI_high | pd | Rhat | ESS |
|---|---|---|---|---|---|---|---|
| b_Intercept | 0.232 | 0.95 | -0.078 | 0.554 | 0.937 | 1.03 | 217 |
| b_Veracity1 | -0.176 | 0.95 | -0.300 | -0.051 | 0.999 | 1.00 | 2392 |
The above numbers are the posterior means and 95%CIs. Those are calculated from the samples we drew from the posterior distribution using brm() above. Note that these summarise differ somewhat from the manually calculated basic SDT summaries above, because the model is different (random stimulus intercepts). Those samples can also be visualized:
Code
gather_draws(m0, b_Intercept, b_Veracity1) %>%
mutate(
.variable = factor(
.variable,
levels = c("b_Intercept", "b_Veracity1"),
labels = c("Criterion", "dprime")
)
) %>%
# Negate intercept to criterion
mutate(.value = if_else(.variable == "Criterion", -.value, .value)) %>%
ggplot(aes(.value, .variable)) +
scale_x_continuous(
"Parameter value",
breaks = extended_breaks(7)
) +
scale_y_discrete(
"Parameter",
expand = expansion(0.01)
) +
geom_vline(xintercept = 0, linewidth = 0.25) +
stat_halfeye(
adjust = 1.5,
slab_color = "black",
slab_linewidth = 0.25,
slab_fill = "grey80",
normalize = "xy",
height = 0.75
)Alternative parameterisation 1
The above parameterisation is the most straightforward and produces parameter estimates that directly reflect the (negative) criterion and d-prime. However, because the model is parameterized as a linear model with an intercept and a slope, the two Veracity conditions can end up having different priors even though that might not be intended.
It is then often useful to reparameterize the model such that the population level parameters indicate means of the latent variable for each level of veracity. That is, instead of estimating an intercept and a slope, we directly estimate two means. This ensures using the same prior on the two veracity conditions’ latent variable. Because we are treating Veracity as a factor in R, removing the intercepts with a 0 does the trick:
Code
f0a <- Answer ~ 0 + Veracity +
(0 + Veracity | Participant) +
(1 | Stimulus)
m0a <- brm(
f0a,
family = bernoulli(link = probit),
data = d,
file = "models/m0a"
)
parameters(m0a, centrality = "mean")| Parameter | Mean | CI | CI_low | CI_high | pd | Rhat | ESS |
|---|---|---|---|---|---|---|---|
| b_VeracityFalse | 0.309 | 0.95 | -0.012 | 0.600 | 0.972 | 1.01 | 400 |
| b_VeracityTrue | 0.132 | 0.95 | -0.177 | 0.423 | 0.814 | 1.01 | 382 |
These parameters now reflect the model-predicted z-scores of the probabilities of answering “True” for False and True stimuli, respectively. We can convert them to SDT metrics as follows (note that we can here reverse the previous model’s intercept to directly give the criterion)
Code
as_draws_df(m0a, variable = "b_", regex = TRUE) %>%
mutate(
dprime = b_VeracityTrue - b_VeracityFalse,
criterion = -0.5 * (b_VeracityTrue + b_VeracityFalse)
) %>%
parameters(centrality = "mean")| Parameter | Mean | CI_low | CI_high | pd |
|---|---|---|---|---|
| b_VeracityFalse | 0.309 | -0.012 | 0.600 | 0.972 |
| b_VeracityTrue | 0.132 | -0.177 | 0.423 | 0.814 |
| dprime | -0.178 | -0.307 | -0.049 | 0.997 |
| criterion | -0.221 | -0.500 | 0.089 | 0.935 |
Model 1
We then proceed to a more complex model, and ask whether bias or sensitivity differ between the training groups. First, we ensure that “None” is the baseline level for Training.
Code
d <- d %>%
mutate(
Training = fct_relevel(Training, "None")
)Then, we can simply enter Training as a main effect and interaction with Veracity.
The default parameterization, and adding Training with dummy contrasts, leads to an intercept that is the negative criterion for Training = “None”. Differences in negative criterion between this baseline and the other two groups are the two Training main effects. The main effect of Veracity indicates dprime for the baseline group, and the two interactions are differences in dprime between the baseline and the two other training groups. We display summaries of the estimates below, while negating the intercepts to criteria
Code
gather_draws(m1, `b_.*`, regex = TRUE) %>%
mutate(
.value = if_else(
str_detect(.variable, "Veracity"),
.value,
-.value
)
) %>%
mean_qi()| .variable | .value | .lower | .upper | .width | .point | .interval |
|---|---|---|---|---|---|---|
| b_Intercept | -0.212 | -0.514 | 0.093 | 0.95 | mean | qi |
| b_TrainingBogus | -0.038 | -0.175 | 0.099 | 0.95 | mean | qi |
| b_TrainingEmotion | -0.022 | -0.160 | 0.112 | 0.95 | mean | qi |
| b_Veracity1 | -0.312 | -0.536 | -0.098 | 0.95 | mean | qi |
| b_Veracity1:TrainingBogus | 0.185 | -0.081 | 0.464 | 0.95 | mean | qi |
| b_Veracity1:TrainingEmotion | 0.184 | -0.083 | 0.462 | 0.95 | mean | qi |
We can then use functions in the emmeans package to calculate other quantities, such as converting differences and interactions to conditional means. However, we cannot here negate the intercepts to directly correspond to criteria.
Code
# Criteria
emm_m1_c1 <- emmeans(m1, ~Training) %>%
parameters(centrality = "mean")
# Differences in criteria
emm_m1_c2 <- emmeans(m1, ~Training) %>%
contrast("pairwise") %>%
parameters(centrality = "mean")
# Dprimes for three groups
emm_m1_d1 <- emmeans(m1, ~Veracity + Training) %>%
contrast("revpairwise", by = "Training") %>%
parameters(centrality = "mean")
# Differences between groups
emm_m1_d2 <- emmeans(m1, ~Veracity + Training) %>%
contrast(interaction = c("revpairwise", "pairwise")) %>%
parameters(centrality = "mean")
reduce(list(emm_m1_c1, emm_m1_c2, emm_m1_d1, emm_m1_d2), bind_rows) %>%
select(c(1:2, 4:5))| Parameter | Mean | CI_low | CI_high |
|---|---|---|---|
| None | 0.212 | -0.093 | 0.514 |
| Bogus | 0.250 | -0.056 | 0.543 |
| Emotion | 0.234 | -0.063 | 0.533 |
| None - Bogus | -0.038 | -0.175 | 0.099 |
| None - Emotion | -0.022 | -0.160 | 0.112 |
| Bogus - Emotion | 0.016 | -0.113 | 0.145 |
| True - False, None | -0.312 | -0.536 | -0.098 |
| True - False, Bogus | -0.127 | -0.320 | 0.065 |
| True - False, Emotion | -0.128 | -0.321 | 0.065 |
| True - False, None - Bogus | -0.185 | -0.464 | 0.081 |
| True - False, None - Emotion | -0.184 | -0.462 | 0.083 |
| True - False, Bogus - Emotion | 0.001 | -0.247 | 0.252 |
Typically, we are more interested in visualizations than tables. We saved the quantities above in objects, which we visualize below:
Code
bind_rows(
bind_rows(emm_m1_d1, emm_m1_d2) %>%
mutate(
Parameter = str_remove(Parameter, "True - False, ") %>%
fct_inorder(),
x = "dprime"
),
bind_rows(emm_m1_c1, emm_m1_c2) %>%
mutate(
Parameter = fct_inorder(Parameter),
x = "Negative criterion"
)
) %>%
mutate(
t = if_else(str_detect(Parameter, " - "), "Differences", "Group means") %>%
fct_inorder()
) %>%
ggplot(aes(Parameter, Mean)) +
labs(
x = "Training group (or difference)",
y = "Posterior mean and 95%CI"
) +
geom_hline(yintercept = 0, linewidth = .2) +
geom_pointrange(aes(ymin = CI_low, ymax = CI_high)) +
facet_grid(
x~t,
scales = "free"
)Alternative parameterization
As above, in the linear model parameterisation above, prior distributions on parameters might have unintended consequences. We therefore reparameterize the model to estimate means of the latent variable instead.
Code
parameters(m1a, centrality = "mean")| Parameter | Mean | CI | CI_low | CI_high | pd | Rhat | ESS |
|---|---|---|---|---|---|---|---|
| b_VeracityFalse:TrainingNone | 0.375 | 0.95 | 0.039 | 0.712 | 0.986 | 1.01 | 553 |
| b_VeracityTrue:TrainingNone | 0.059 | 0.95 | -0.266 | 0.393 | 0.639 | 1.01 | 579 |
| b_VeracityFalse:TrainingBogus | 0.316 | 0.95 | -0.008 | 0.639 | 0.971 | 1.01 | 572 |
| b_VeracityTrue:TrainingBogus | 0.191 | 0.95 | -0.128 | 0.513 | 0.882 | 1.01 | 567 |
| b_VeracityFalse:TrainingEmotion | 0.297 | 0.95 | -0.015 | 0.625 | 0.967 | 1.01 | 601 |
| b_VeracityTrue:TrainingEmotion | 0.173 | 0.95 | -0.140 | 0.497 | 0.859 | 1.01 | 539 |
Calculating the SDT parameters then takes some more work, however. We need to manually calculate them for each group.
Code
x <- gather_draws(m1a, `b_.*`, regex = TRUE) %>%
separate(.variable, c("Veracity", "Training"), sep = ":") %>%
mutate(
Veracity = str_remove(Veracity, "b_Veracity"),
Training = str_remove(Training, "Training")
) %>%
pivot_wider(names_from = Veracity, values_from = .value) %>%
mutate(
Criterion = -0.5 * (True + False),
dprime = True - False
) %>%
select(-False, -True) %>%
pivot_longer(Criterion:dprime)
x2 <- x %>%
group_by(name) %>%
compare_levels(value, Training)Code
bind_rows(x, x2) %>%
mutate(
t = if_else(str_detect(Training, "-"), "Difference", "Mean") %>%
fct_inorder(),
Training = fct_inorder(Training)
) %>%
ggplot(aes(Training, value)) +
labs(
x = "Training group (or difference)",
y = "Posterior mean and 95%CI"
) +
geom_hline(yintercept = 0, linewidth = 0.25) +
stat_halfeye(
normalize = "xy",
width = 0.5,
slab_color = "black",
slab_linewidth = 0.25,
slab_fill = "grey80"
) +
facet_grid(
name~t,
scales = "free"
)Region of practical equivalence
Because our estimates are random samples from the posterior distribution, it is easy to calculate further inferential quantities. For example, we might ask whether the estimated criteria, dprimes, or their differences were practically equivalent to zero. To do so, we establish arbitrarily that the region of practical equivalence to zero is from -0.1 to 0.1. We then calculate percentages of each posterior distribution that lie within this ROPE.
Code
rope <- bind_rows(x, x2) %>%
mutate(
t = if_else(str_detect(Training, "-"), "Difference", "Mean") %>%
fct_inorder(),
Training = fct_inorder(Training)
) %>%
group_by(Training, name, t) %>%
group_modify(
~describe_posterior(
.$value,
test = "rope",
rope_ci = 1,
rope_range = c(-0.1, 0.1)
)
) %>%
select(1:3, ROPE_Percentage)Code
bind_rows(x, x2) %>%
mutate(
t = if_else(str_detect(Training, "-"), "Difference", "Mean") %>%
fct_inorder(),
Training = fct_inorder(Training)
) %>%
ggplot(aes(Training, value)) +
labs(
x = "Training group (or difference)",
y = "Posterior mean and 95%CI"
) +
geom_hline(yintercept = 0, linewidth = 0.25) +
scale_fill_manual(
"In ROPE",
values = c(alpha("dodgerblue1", .5), alpha("dodgerblue4", .75)),
aesthetics = c("slab_fill")
) +
stat_halfeye(
normalize = "xy",
width = 0.5,
slab_color = "black",
slab_linewidth = 0.25,
aes(slab_fill = after_stat(between(y, -0.1, 0.1)))
) +
geom_text(
data = rope,
aes(label = percent(ROPE_Percentage, 1)),
y = 0.05, nudge_x = -0.2
) +
facet_grid(
name~t,
scales = "free"
)Model 2
Finally, we include the within-person manipulation LieType as well.
Code
parameters(m2, centrality = "mean")| Parameter | Mean | CI | CI_low | CI_high | pd | Rhat | ESS |
|---|---|---|---|---|---|---|---|
| b_Intercept | 0.151 | 0.95 | -1.226 | 1.531 | 0.589 | 1 | 2084 |
| b_Veracity1 | -0.293 | 0.95 | -0.521 | -0.058 | 0.992 | 1 | 2133 |
| b_TrainingBogus | 0.047 | 0.95 | -0.116 | 0.212 | 0.720 | 1 | 2435 |
| b_TrainingEmotion | 0.154 | 0.95 | -0.005 | 0.315 | 0.972 | 1 | 2292 |
| b_LieTypeExperiential | 0.118 | 0.95 | -1.291 | 1.547 | 0.568 | 1 | 1978 |
| b_Veracity1:TrainingBogus | 0.100 | 0.95 | -0.219 | 0.415 | 0.741 | 1 | 2674 |
| b_Veracity1:TrainingEmotion | 0.186 | 0.95 | -0.127 | 0.489 | 0.881 | 1 | 2253 |
| b_Veracity1:LieTypeExperiential | 0.399 | 0.95 | -0.334 | 1.148 | 0.862 | 1 | 964 |
| b_TrainingBogus:LieTypeExperiential | -0.019 | 0.95 | -0.233 | 0.194 | 0.573 | 1 | 3477 |
| b_TrainingEmotion:LieTypeExperiential | -0.290 | 0.95 | -0.498 | -0.076 | 0.997 | 1 | 3082 |
| b_Veracity1:TrainingBogus:LieTypeExperiential | 0.202 | 0.95 | -0.270 | 0.668 | 0.803 | 1 | 3041 |
| b_Veracity1:TrainingEmotion:LieTypeExperiential | 0.013 | 0.95 | -0.445 | 0.479 | 0.519 | 1 | 2590 |
From these estimates, we can again use emmeans to produce other inferential quantities, like differences:
Code
emm_m2_d1 <- emmeans(m2, ~Veracity | Training * LieType) %>%
contrast("revpairwise") %>%
parameters(centrality = "mean")
emm_m2_d2 <- emmeans(m2, ~Veracity + Training * LieType) %>%
contrast(interaction = c("revpairwise", "pairwise"), by = "LieType") %>%
parameters(centrality = "mean")
# Criteria
emm_m2_c1 <- emmeans(m2, ~Training * LieType) %>%
parameters(centrality = "mean")
emm_m2_c2 <- emmeans(m2, ~Training | LieType) %>%
contrast("pairwise") %>%
parameters(centrality = "mean")Code
# Dprimes
list(emm_m2_c1, emm_m2_c2, emm_m2_d1, emm_m2_d2) %>%
map(~tibble(.)) %>%
bind_rows() %>%
mutate(
p = if_else(str_detect(Parameter, "True - False"), "dprime", "Criterion"),
Parameter = str_remove(Parameter, "True - False, ")
) %>%
separate(Parameter, c("Training", "LieType"), sep = ", ") %>%
mutate(
t = if_else(str_detect(Training, " - "), "Differences", "Group means") %>%
fct_inorder(),
Training = fct_inorder(Training)
) %>%
ggplot(aes(Training, Mean, col = LieType)) +
labs(
x = "Training group (or difference)",
y = "Posterior mean and 95%CI"
) +
geom_hline(yintercept = 0, linewidth = .25) +
geom_pointrange(
aes(ymin = CI_low, ymax = CI_high),
position = position_dodge(width = .25)
) +
facet_grid(p~t, scales = "free")Alternative parameterization
Code
parameters(m2a)| Parameter | Median | CI | CI_low | CI_high | pd | Rhat | ESS |
|---|---|---|---|---|---|---|---|
| b_TrainingNone:LieTypeAffective | 0.148 | 0.95 | -1.216 | 1.505 | 0.583 | 1.00 | 1666 |
| b_TrainingBogus:LieTypeAffective | 0.200 | 0.95 | -1.202 | 1.564 | 0.614 | 1.00 | 1691 |
| b_TrainingEmotion:LieTypeAffective | 0.303 | 0.95 | -1.112 | 1.653 | 0.671 | 1.00 | 1684 |
| b_TrainingNone:LieTypeExperiential | 0.273 | 0.95 | -0.053 | 0.612 | 0.950 | 1.01 | 933 |
| b_TrainingBogus:LieTypeExperiential | 0.298 | 0.95 | -0.014 | 0.632 | 0.969 | 1.01 | 905 |
| b_TrainingEmotion:LieTypeExperiential | 0.139 | 0.95 | -0.187 | 0.469 | 0.800 | 1.01 | 895 |
| b_TrainingNone:LieTypeAffective:Veracity1 | -0.294 | 0.95 | -0.522 | -0.070 | 0.995 | 1.00 | 5535 |
| b_TrainingBogus:LieTypeAffective:Veracity1 | -0.189 | 0.95 | -0.393 | 0.017 | 0.966 | 1.00 | 5909 |
| b_TrainingEmotion:LieTypeAffective:Veracity1 | -0.106 | 0.95 | -0.316 | 0.096 | 0.854 | 1.00 | 5600 |
| b_TrainingNone:LieTypeExperiential:Veracity1 | 0.097 | 0.95 | -0.606 | 0.800 | 0.618 | 1.01 | 814 |
| b_TrainingBogus:LieTypeExperiential:Veracity1 | 0.405 | 0.95 | -0.258 | 1.083 | 0.888 | 1.01 | 730 |
| b_TrainingEmotion:LieTypeExperiential:Veracity1 | 0.303 | 0.95 | -0.363 | 0.987 | 0.822 | 1.01 | 821 |
Conclusion
TBD
Further reading
The GLMM analysis of SDT models in R is based on Vuorre (2017), which in turn drew on discussions in Rouder and Lu (2005), Rouder et al. (2007), DeCarlo (1998), DeCarlo (2010), and Decarlo (2003). Macmillan and Creelman (2005) is a classic introductory text on SDT, and Stanislaw and Todorov (1999) discusses the calculations required for SDT metrics in some detail. Bürkner (2017) is a good introductory text on the brms package; Wickham and Grolemund (2016) is an excellent book on R; and McElreath (2016) is a recommended introductory textbook on bayesian statistics and probabilistic modelling.
References
Appendix
Simpler Model 0
This is just to show that the estimated values reflect manually calculated closely when we drop stimulus random effects
Code
f0a2 <- Answer ~ 1 + Veracity + (1 + Veracity | Participant)
m0a2 <- brm(
formula = f0a2,
family = bernoulli(link = probit),
data = d,
file = "models/m0a2"
)
f0a3 <- Answer ~ 0 + Veracity + (0 + Veracity | Participant)
m0a3 <- brm(
formula = f0a3,
family = bernoulli(link = probit),
data = d,
file = "models/m0a3"
)
parameters(m0a2, centrality = "mean")| Parameter | Mean | CI | CI_low | CI_high | pd | Rhat | ESS |
|---|---|---|---|---|---|---|---|
| b_Intercept | 0.218 | 0.95 | 0.170 | 0.267 | 1.000 | 1 | 3504 |
| b_Veracity1 | 0.030 | 0.95 | -0.066 | 0.125 | 0.737 | 1 | 4315 |
Code
as_draws_df(m0a3, variable = "b_", regex = TRUE) %>%
mutate(
dprime = b_VeracityTrue - b_VeracityFalse,
crit = -0.5 * (b_VeracityTrue + b_VeracityFalse)
) %>%
parameters(centrality = "mean")| Parameter | Mean | CI_low | CI_high | pd |
|---|---|---|---|---|
| b_VeracityFalse | 0.204 | 0.134 | 0.274 | 1.000 |
| b_VeracityTrue | 0.233 | 0.167 | 0.300 | 1.000 |
| dprime | 0.029 | -0.063 | 0.123 | 0.716 |
| crit | -0.218 | -0.266 | -0.171 | 1.000 |
Code
sdt %>%
select(Participant, zfa, zhr, dprime, crit) %>%
pivot_longer(-Participant) %>%
summarise(t = parameters(t.test(value)), .by = name)| name | t |
|---|---|
| zfa | value |
| zhr | value |
| dprime | value |
| crit | value |
Code
f0a4 <- bf(
Answer ~ Phi(dprime * Veracity - criterion),
dprime ~ 1 + (1 |s| Participant),
criterion ~ 1 + (1 |s| Participant),
nl = TRUE
)
p0a4 <- prior(normal(0, 1), nlpar = "dprime") +
prior(normal(0, 1), nlpar = "criterion") +
prior(normal(0, 1), nlpar = "dprime", class = "sd") +
prior(normal(0, 1), nlpar = "criterion", class = "sd")
m0a4 <- brm(
f0a4,
family = bernoulli(link = "identity"),
data = d,
prior = p0a4,
file = "models/m0a4"
)
parameters(m0a3, centrality = "mean")| Parameter | Mean | CI | CI_low | CI_high | pd | Rhat | ESS |
|---|---|---|---|---|---|---|---|
| b_VeracityFalse | 0.204 | 0.95 | 0.134 | 0.274 | 1 | 1 | 3889 |
| b_VeracityTrue | 0.233 | 0.95 | 0.167 | 0.300 | 1 | 1 | 4984 |
Receiving Operator Characteristic
People sometimes like talking about the receiver operating characteristic and area under the curve measures. At least they allow for nice plots (but those require some hairy code).
The ROC plots hit rates against false alarm rates for varying threshold levels. However, we only get one threshold (per participant):
Code
sdt %>%
ggplot(aes(far, hr)) +
geom_abline(lty = 2, linewidth = .33) +
scale_x_continuous(
"False alarm rate",
limits = c(0, 1),
expand = expansion(0.01)
) +
scale_y_continuous(
"Hit rate",
limits = c(0, 1),
expand = expansion(0.01)
) +
geom_smooth(
method = "lm",
linewidth = .33,
color = "black",
alpha = .2
) +
geom_point(
shape = 1,
position = position_jitter(.01, .01)
) +
stat_centroid(
.fun = mean, col = "red",
geom = "point", size = 2
) +
theme(aspect.ratio = 1)One solution to this is to plot the estimated hit rate for all false alarm rates. We do so here for the average participant. We display 100 random ROCs from the posterior to display uncertainty.
Code
grid <- tibble(
x = seq(-2.5, 2.5, length.out = 30),
far = pnorm(x, lower = FALSE),
zfar = qnorm(far)
)
rocs <- as_draws_df(m0a) %>%
slice_sample(n = 100) %>%
select(starts_with("b_")) %>%
rownames_to_column("i") %>%
crossing(grid) %>%
mutate(hr = pnorm(x, b_VeracityTrue - b_VeracityFalse, lower = FALSE))Code
rocs %>%
ggplot(aes(far, hr)) +
geom_abline(lty = 2, linewidth = .33) +
scale_x_continuous(
"False alarm rate",
limits = c(0, 1),
expand = expansion(0.01)
) +
scale_y_continuous(
"Hit rate",
limits = c(0, 1),
expand = expansion(0.01)
) +
geom_line(
aes(group = i),
alpha = .25,
linewidth = .33
) +
theme(
aspect.ratio = 1
)Reuse
Citation
@online{vuorre2023,
author = {Vuorre, Matti and Zloteanu, Mircea},
title = {Analyzing {Deception} {Data} with {Hierarchical} {Bayesian}
{Signal} {Detection} {Theoretic} {Models} {(Computational} Notebook
Supplement to {TBD)}},
date = {2023-04-21},
langid = {en}
}